{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Overlay Tutorial\n",
    "\n",
    "This notebook gives an overview of how the Overlay class should be used efficiently.\n",
    "\n",
    "The redesigned Overlay class has three main design goals\n",
    " * Allow overlay users to find out what is inside an overlay in a consistent manner\n",
    " * Provide a simple way for developers of new hardware designs to test new IP\n",
    " * Facilitate reuse of IP between Overlays\n",
    " \n",
    "This tutorial is primarily designed to demonstrate the final two points, walking through the process of interacting with a new IP, developing a driver, and finally building a more complex system from multiple IP blocks. All of the code and block diagrams can be found at [https://github.com/PeterOgden/overlay_tutorial]. For these examples to work copy the contents of the overlays directory into the home directory on the PYNQ-Z1 board."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Developing a Single IP\n",
    "\n",
    "For this first example we are going to use a simple design with a single IP contained in it. This IP was developed using HLS and adds two 32-bit integers together. The full code for the accelerator is:\n",
    "\n",
    "```C++\n",
    "void add(int a, int b, int& c) {\n",
    "#pragma HLS INTERFACE ap_ctrl_none port=return\n",
    "#pragma HLS INTERFACE s_axilite port=a\n",
    "#pragma HLS INTERFACE s_axilite port=b\n",
    "#pragma HLS INTERFACE s_axilite port=c\n",
    "\n",
    "\tc = a + b;\n",
    "}\n",
    "```\n",
    "\n",
    "With a block diagram consisting solely of the HLS IP and required glue logic to connect it to the ZYNQ7 IP\n",
    "\n",
    "![Simple Block Diagram](../images/attribute1.png)\n",
    "\n",
    "To interact with the IP first we need to load the overlay containing the IP."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/javascript": [
       "\n",
       "require(['notebook/js/codecell'], function(codecell) {\n",
       "  codecell.CodeCell.options_default.highlight_modes[\n",
       "      'magic_text/x-csrc'] = {'reg':[/^%%microblaze/]};\n",
       "  Jupyter.notebook.events.one('kernel_ready.Kernel', function(){\n",
       "      Jupyter.notebook.get_cells().map(function(cell){\n",
       "          if (cell.cell_type == 'code'){ cell.auto_highlight(); } }) ;\n",
       "  });\n",
       "});\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from pynq import Overlay\n",
    "\n",
    "overlay = Overlay('/home/xilinx/tutorial_1.bit')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Creating the overlay will automatically download it. We can now use a question mark to find out what is in the overlay."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "overlay?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All of the entries are accessible via attributes on the overlay class with the specified driver. Accessing the `scalar_add` attribute of the will create a driver for the IP - as there is no driver currently known for the `Add` IP core `DefaultIP` driver will be used so we can interact with IP core."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "add_ip = overlay.scalar_add\n",
    "add_ip?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By providing the HWH file along with overlay we can also expose the register map associated with IP."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RegisterMap {\n",
       "  a = Register(a=0),\n",
       "  b = Register(b=0),\n",
       "  c = Register(c=0),\n",
       "  c_ctrl = Register(c_ap_vld=1, RESERVED=0)\n",
       "}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "add_ip.register_map"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can interact with the IP using the register map directly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Register(c=7)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "add_ip.register_map.a = 3\n",
    "add_ip.register_map.b = 4\n",
    "add_ip.register_map.c"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively by reading the driver source code generated by HLS we can determine that offsets we need to write the two arguments are at offsets `0x10` and `0x18` and the result can be read back from `0x20`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "9"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "add_ip.write(0x10, 4)\n",
    "add_ip.write(0x18, 5)\n",
    "add_ip.read(0x20)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating a Driver\n",
    "\n",
    "While the `UnknownIP` driver is useful for determining that the IP is working it is not the most user-friendly API to expose to the eventual end-users of the overlay. Ideally we want to create an IP-specific driver exposing a single `add` function to call the accelerator. Custom drivers are created by inheriting from `UnknownIP` and adding a `bindto` class attribute consisting of the IP types the driver should bind to. The constructor of the class should take a single `description` parameter and pass it through to the super class `__init__`. The description is a dictionary containing the address map and any interrupts and GPIO pins connected to the IP."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pynq import DefaultIP\n",
    "\n",
    "class AddDriver(DefaultIP):\n",
    "    def __init__(self, description):\n",
    "        super().__init__(description=description)\n",
    "    \n",
    "    bindto = ['xilinx.com:hls:add:1.0']\n",
    "    \n",
    "    def add(self, a, b):\n",
    "        self.write(0x10, a)\n",
    "        self.write(0x18, b)\n",
    "        return self.read(0x20)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now if we reload the overlay and query the help again we can see that our new driver is bound to the IP."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "overlay = Overlay('/home/xilinx/tutorial_1.bit')\n",
    "overlay?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And we can access the same way as before except now our custom driver with an `add` function is created instead of `DefaultIP`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "35"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "overlay.scalar_add.add(15,20)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reusing IP\n",
    "\n",
    "Suppose we or someone else develops a new overlay and wants to reuse the existing IP. As long as they import the python file containing the driver class the drivers will be automatically created. As an example consider the next design which, among other things includes a renamed version of the `scalar_add` IP.\n",
    "\n",
    "![Second Block Diagram](../images/attribute2.png)\n",
    "\n",
    "Using the question mark on the new overlay shows that the driver is still bound."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "overlay = Overlay('/home/xilinx/tutorial_2.bit')\n",
    "overlay?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## IP Hierarchies\n",
    "\n",
    "The block diagram above also contains a hierarchy `const_multiply`, which looks like this:\n",
    "\n",
    "![Hierarchy](../images/hierarchy.png)\n",
    "\n",
    "Said hierarchy contains a custom IP with an input and output stream, an AXI4-Lite interface as well as a DMA engine for transferring the data. The custom IP multiply the value of `data` in the input stream by `constant` and outputs the result without modifying the rest of signals. As streams are involved we need to handle `TLAST` appropriately for the DMA engine. The HLS code is a little bit more complex with additional pragmas and types but the complete code is still relatively short.\n",
    "\n",
    "```C\n",
    "#include \"ap_axi_sdata.h\"\n",
    "typedef ap_axiu<32,1,1,1> stream_type;\n",
    "\n",
    "void mult_constant(stream_type* in_data, stream_type* out_data, ap_int<32> constant) {\n",
    "#pragma HLS INTERFACE s_axilite register port=constant\n",
    "#pragma HLS INTERFACE ap_ctrl_none port=return\n",
    "#pragma HLS INTERFACE axis port=in_data\n",
    "#pragma HLS INTERFACE axis port=out_data\n",
    "\tout_data->data = in_data->data * constant;\n",
    "\tout_data->dest = in_data->dest;\n",
    "\tout_data->id = in_data->id;\n",
    "\tout_data->keep = in_data->keep;\n",
    "\tout_data->last = in_data->last;\n",
    "\tout_data->strb = in_data->strb;\n",
    "\tout_data->user = in_data->user;\n",
    "\n",
    "}\n",
    "```\n",
    "\n",
    "Looking at the HLS generated documentation we again discover that to set the constant we need to set the register at offset `0x10` so we can write a simple driver for this purpose"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "class ConstantMultiplyDriver(DefaultIP):\n",
    "    def __init__(self, description):\n",
    "        super().__init__(description=description)\n",
    "    \n",
    "    bindto = ['xilinx.com:hls:mult_constant:1.0']\n",
    "    \n",
    "    @property\n",
    "    def constant(self):\n",
    "        return self.read(0x10)\n",
    "    \n",
    "    @constant.setter\n",
    "    def constant(self, value):\n",
    "        self.write(0x10, value)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The DMA engine driver is already included inside the PYNQ driver so nothing special is needed for that other than ensuring the module is imported. Reloading the overlay will make sure that our newly written driver is available for use."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pynq.lib.dma\n",
    "\n",
    "overlay = Overlay('/home/xilinx/tutorial_2.bit')\n",
    "\n",
    "dma = overlay.const_multiply.multiply_dma\n",
    "multiply = overlay.const_multiply.multiply"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The DMA driver transfers numpy arrays allocated using `pynq.allocate`. Lets test the system by multiplying 5 numbers by 3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ContiguousArray([ 0,  3,  6,  9, 12], dtype=uint32)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pynq import allocate\n",
    "import numpy as np\n",
    "\n",
    "in_buffer = allocate(shape=(5,), dtype=np.uint32)\n",
    "out_buffer = allocate(shape=(5,), dtype=np.uint32)\n",
    "\n",
    "for i in range(5):\n",
    "    in_buffer[i] = i\n",
    "\n",
    "multiply.constant = 3\n",
    "dma.sendchannel.transfer(in_buffer)\n",
    "dma.recvchannel.transfer(out_buffer)\n",
    "dma.sendchannel.wait()\n",
    "dma.recvchannel.wait()\n",
    "\n",
    "out_buffer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "While this is one way to use the IP, it still isn't exactly user-friendly. It would be preferable to treat the entire hierarchy as a single entity and write a driver that hides the implementation details. The overlay class allows for drivers to be written against hierarchies as well as IP but the details are slightly different.\n",
    "\n",
    "Hierarchy drivers are subclasses of `pynq.DefaultHierarchy` and, similar to `DefaultIP` have a constructor that takes a description of hierarchy. To determine whether the driver should bind to a particular hierarchy the class should also contain a static `checkhierarchy` method which takes the description of a hierarchy and returns `True` if the driver should be bound or `False` if not. Similar to `DefaultIP`, any classes that meet the requirements of subclasses `DefaultHierarchy` and have a `checkhierarchy` method will automatically be registered.\n",
    "\n",
    "For our constant multiply hierarchy this would look something like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pynq import DefaultHierarchy\n",
    "\n",
    "class StreamMultiplyDriver(DefaultHierarchy):\n",
    "    def __init__(self, description):\n",
    "        super().__init__(description)\n",
    "        \n",
    "    def stream_multiply(self, stream, constant):\n",
    "        self.multiply.constant = constant\n",
    "        with allocate(shape=(len(stream),), \\\n",
    "                      dtype=np.uint32) as in_buffer,\\\n",
    "             allocate(shape=(len(stream),), \\\n",
    "                      dtype=np.uint32) as out_buffer:\n",
    "            for i, v, in enumerate(stream):\n",
    "                in_buffer[i] = v\n",
    "            self.multiply_dma.sendchannel.transfer(in_buffer)\n",
    "            self.multiply_dma.recvchannel.transfer(out_buffer)\n",
    "            self.multiply_dma.sendchannel.wait()\n",
    "            self.multiply_dma.recvchannel.wait()\n",
    "            result = out_buffer.copy()\n",
    "        return result\n",
    "\n",
    "    @staticmethod\n",
    "    def checkhierarchy(description):\n",
    "        if 'multiply_dma' in description['ip'] \\\n",
    "           and 'multiply' in description['ip']:\n",
    "            return True\n",
    "        return False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now reload the overlay and ensure the higher-level driver is loaded"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "overlay = Overlay('/home/xilinx/tutorial_2.bit')\n",
    "overlay?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and use it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ContiguousArray([ 5, 10, 15, 20, 25], dtype=uint32)"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "overlay.const_multiply.stream_multiply([1,2,3,4,5], 5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overlay Customisation\n",
    "\n",
    "While the default overlay is sufficient for many use cases, some overlays will require more customisation to provide a user-friendly API. As an example the default AXI GPIO drivers expose channels 1 and 2 as separate attributes meaning that accessing the LEDs in the base overlay requires the following contortion"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "base = Overlay('base.bit')\n",
    "base.leds_gpio.channel1[0].on()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To mitigate this the overlay developer can provide a custom class for their overlay to expose the subsystems in a more user-friendly way.  The base overlay includes custom overlay class which performs the following functions:\n",
    " * Make the AXI GPIO devices better named and range/direction restricted\n",
    " * Make the IOPs accessible through the `pmoda`, `pmodb` and `ardiuno` names\n",
    " * Create a special class to interact with RGB LEDs\n",
    " \n",
    "The result is that the LEDs can be accessed like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pynq.overlays.base import BaseOverlay\n",
    "\n",
    "base = BaseOverlay('base.bit')\n",
    "base.leds[0].on()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using a well defined class also allows for custom docstrings to be provided also helping end users."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "base?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating a custom overlay\n",
    "\n",
    "Custom overlay classes should inherit from `pynq.UnknownOverlay` taking a the full path of the bitstream file and possible additional keyword arguments. These parameters should be passed to `super().__init__()` at the start of `__init__` to initialise the attributes of the Overlay. This example is designed to go with our tutorial_2 overlay and adds a function to more easily call the multiplication function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "class TestOverlay(Overlay):\n",
    "    def __init__(self, bitfile, **kwargs):\n",
    "        super().__init__(bitfile, **kwargs)\n",
    "    \n",
    "    def multiply(self, stream, constant):\n",
    "        return self.const_multiply.stream_multiply(stream, constant)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To test our new overlay class we can construct it as before."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ContiguousArray([ 8, 12, 16, 20, 24], dtype=uint32)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "overlay = TestOverlay('/home/xilinx/tutorial_2.bit')\n",
    "overlay.multiply([2,3,4,5,6], 4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Included Drivers\n",
    "\n",
    "The pynq library includes a number of drivers as part of the `pynq.lib` package. These include\n",
    "\n",
    "* AXI GPIO\n",
    "* AXI DMA (simple mode only)\n",
    "* AXI VDMA\n",
    "* AXI Interrupt Controller (internal use)\n",
    "* Pynq-Z1 Audio IP\n",
    "* Pynq-Z1 HDMI IP\n",
    "   * Color convert IP\n",
    "   * Pixel format conversion\n",
    "   * HDMI input and output frontends\n",
    "* Pynq Microblaze program loading"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
